Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Seismic monitoring systems sift through seismograms in real-time, searching for target events, such as underground explosions. In this monitoring system, a burst of aftershocks (minor earthquakes occur after a major earthquake over days or even years) can be a source of confounding signals. Such a burst of aftershock signals can overload the human analysts of the monitoring system. To alleviate this burden at the onset of a sequence of events (e.g., aftershocks), a human analyst can label the first few of these events and start an online classifier to filter out subsequent aftershock events. We propose an online few-shot classification model FewSig for time series data for the above use case. The framework of FewSig consists of a selective model to identify the high-confidence positive events which are used for updating the models and a general classifier to label the remaining events. Our specific technique uses a %two-level decision tree selective model based on sliding DTW distance and a general classifier model based on distance metric learning with Neighborhood Component Analysis (NCA). The algorithm demonstrates surprising robustness when tested on univariate datasets from the UEA/UCR archive. Furthermore, we show two real-world earthquake events where the FewSig reduces the human effort in monitoring applications by filtering out the aftershock events.more » « less
-
Monitoring systems have hundreds or thousands of distributed sensors gathering and transmitting real-time streaming data. The early detection of events in these systems, such as an earthquake in a seismic monitoring system, is the base for essential tasks as warning generations. To detect such events is usual to compute pairwise correlation across the disparate signals generated by the sensors. Since the data sources (e.g., sensors) are spatially separated, it is essential to consider the lagged correlation between the signals. Besides, many applications require to process a specific band of frequencies depending on the event’s type, demanding a pre-processing step of filtering before computing correlations. Due to the high speed of data generation and a large number of sensors in these systems, the operations of filtering and lagged cross-correlation need to be efficient to provide real-time responses without data losses. This article proposes a technique named FilCorr that efficiently computes both operations in one single step. We achieve an order of magnitude speedup by maintaining frequency transforms over sliding windows. Our method is exact, devoid of sensitive parameters, and easily parallelizable. Besides our algorithm, we also provide a publicly available real-time system named Seisviz that employs FilCorr in its core mechanism for monitoring a seismometer network. We demonstrate that our technique is suitable for several monitoring applications as seismic signal monitoring, motion monitoring, and neural activity monitoring.more » « less
-
This paper introduces a new pattern mining task that considers aligning or joining a set of time series based on an arbitrary number of subsequences (i.e., patterns) with arbitrary lengths. Joining multiple time series along common patterns can be pivotal in clustering and summarizing large time series datasets. An exact algorithm to join hundreds of time series based on multi-length patterns is impractical due to the high computational costs. This paper proposes a fast algorithm named MultiPAL to join multiple time series at interactive speed to summarize large time series datasets. The algorithm exploits Matrix Profiles of the individual time series to enable a greedy search over possible joins. The algorithm is orders of magnitude faster than the exact solution and can utilize hundreds of Matrix Profiles. We evaluate our algorithm for sequential mining on data from various real-world domains, including power management and bioacoustics monitoring.more » « less
-
null (Ed.)Stream mining considers the online arrival of examples at high speed and the possibility of changes in its descriptive features or class definitions compared with past knowledge (i.e., concept drifts). The fast detection of drifts is essential to keep the predictive model updated and stable in changing environments. For many applications, such as those related to smart sensors, the high number of features is an additional challenge in terms of memory and time for stream processing. This paper presents an unsupervised and model-independent concept drift detector suitable for high-speed and high-dimensional data streams. We propose a straightforward two-dimensional data representation that allows the faster processing of datasets with a large number of examples and dimensions. We developed an adaptive drift detector on this visual representation that is efficient for fast streams with thousands of features and is accurate as existing costly methods that perform various statistical tests considering each feature individually. Our method achieves better performance measured by execution time and accuracy in classification problems for different types of drifts. The experimental evaluation considering synthetic and real data demonstrates the method’s versatility in several domains, including entomology, medicine, and transportation systems.more » « less
-
null (Ed.)Changes in data distribution of streaming data (i.e., concept drifts), constitute a central issue in online data mining. The main reason is that these changes are responsible for outdating stream learning models, reducing their predictive performance over time. A common approach adopted by real-time adaptive systems to deal with concept drifts is to employ detectors that indicate the best time for updates. However, an unrealistic assumption of most detectors is that the labels become available immediately after data arrives. In this paper, we introduce an unsupervised and model-independent concept drift detector suitable for high-speed and high-dimensional data streams in realistic scenarios with the scarcity of labels. We propose a straightforward two-dimensional representation of the data aiming faster processing for detection. We develop a simple adaptive drift detector on this visual representation that is efficient for fast streams with thousands of features and is accurate as existing costly methods that perform various statistical tests. Our method achieves better performance measured by execution time and accuracy in classification problems for different types of drifts, including abrupt, oscillating, and incremental. Experimental evaluation demonstrates the versatility of the method in several domains, including astronomy, entomology, public health, political science, and medical science.more » « less
-
null (Ed.)An essential task on streaming time series data is to compute pairwise correlation across disparate signal sources to identify significant events. In many monitoring applications, such as geospatial monitoring, motion monitoring and critical infrastructure monitoring, correlation is observed at various frequency bands and temporal lags. In this paper, we consider computing filtered and lagged correlation on streaming time series data, which is challenging because the computation must be “in-sync” with the incoming stream for any detected events to be useful. We propose a technique to compute filtered and lagged correlation on streaming data efficiently by merging two individual operations: filtering and cross-correlations. We achieve an order of magnitude speed-up by maintaining frequency transforms over sliding windows. Our method is exact, devoid of sensitive parameters, and easily parallelizable. We demonstrate our technique in a seismic signal monitoring application.more » « less
-
Abstract Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.more » « less
-
This dataset contains the electric power consumption data from the Los Alamos Public Utility Department (LADPU) in New Mexico, USA. The data was collected by Landis+Gyr smart meters devices on 1,757 households at North Mesa, Los Alamos, NM. The sampling rate is one observation every fifteen minutes (i.e., 96 observations per day). For most customers, the data spans about six years, from July 30, 2013 to December 30, 2019. However, for some customers, the period is reduced. The dataset contains missing values and duplicated measurements. This dataset is provided in its original format, without cleaning or pre-processing. The only procedure performed was for anonymization reasons. Thus, the data are not normalized, and it has missing values and duplicate entries (i.e., more than one measurement for the same time). However, these issues represent only a small portion of data.more » « less
An official website of the United States government
